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ABSTRACT 

Two  exploratory  experiments  conducted  at  System  Development  Corporation 
compared  debugging  performance  of  programera  working  under  conditions 
of  online  and  offline  access  to  a  computer.  These  are  the  first  known 
studies  measuring  the  performance  of  programera  under  controlled  con¬ 
ditions  for  standard  tasks.  In  the  first  Btudy,  two  groups  of  six 
subjects  each,  comprising  a  total  sample  of  12  experienced  programers, 
debugged  two  types  of  programs  under  online  and  offline  conditions  in 
accordance  with  a  Latin- Square  experimental  design.  The  online  con¬ 
dition  was  the  normal  mods  of  operation  for  the  SDC  Time-Sharing  System; 
tbs  offline  condition  was  a  simulated  closed-shop  with  a  two-hour  turn¬ 
around  time.  In  the  second  study,  following  a  similar  experimental 
design,  two  groups  of  prograasr  trainees— four  and  five  in  each  group 
for  a  total  of  nine  subjects— debugged  two  standard  problems  under 
Interactive  end  nonlateractlva  conditions.  The  interactive  mode  was 
the  normal  SDC  Time-Sharing  System;  the  noninteractive  mods  was  a 
simulated  multiple-consols,  opsn-shop  system. 

Statistically  significant  results  Indicated  substantially  faster  de¬ 
bugging  under  online  conditions  in  both  studies.  The  results  were 
ambiguous  for  central  processor  time— one  study  showed  lass  computer 
tliM  for  debugging,  end  the  other  showed  more  time  in  the  online  mode. 
Perhaps  the  most,  important  practical  finding,  overshadowing  onllna/offline 
differences,  involves  the  large  and  striking  individual  differences 
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in  prograaer  performance.  Attempts  were  made  to  relate  observed 
Individual  differences  to  objective  measures  of  programer  experience 
and  proficiency  through  factorial  techniques.  In  line  with  the 
exploratory  objectives  of  these  studies,  methodological  problems 
encountered  in  designing  and  conducting  these  types  of  experiments 
are  described,  limitations  of  the  findings  are  pointed  out,  hypotheses 
are  presented  to  account  for  results,  and  suggestions  are  made  for 
further  research. 


The  research  reported  in  this  paper  was  sponsored  by 
the  Advanced  Research  Projects  Agency  Information 
Processing  Techniques  Office  and  was  monitored  by  the 
Electronic  Systems  Division,  Air  Force  Systems  Command 
under  contract  F  1962867C0004,  Information  Processing 
Techniques ,  with  the  System  Development  Corporation. 
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EXPLORATORY  EXPERIMENTAL  STUDIES  COMPARING  ONLINE 
AND  OFFLINE  PROGRAMING  PERFORMANCE 


Computer  programing,  today,  is  a  multi-billion  dollar  industry.  Major  resources 
are  being  expended  on  the  development  of  new  programing  languages,  new  software 
techniques,  and  improved  means  for  man-computer  communications.  As  computer 
power  grows,  and  as  computer  hardware  costs  go  down  with  advancing  computer  tech 
nology,  the  human  costs  of  computer  programing  continue  to  rise  and  will  prc' 
greatly  exceed  the  cost  of  hardware  in  the  scheme  of  things  to  come.  Amid  all 
these  portents  and  signs  of  the  growing  importance  and  the  dominating  role  of 
computer  programing  in  the  emerging  computer  scene,  one  would  expect  that  com¬ 
puter  programing  would  be  the  object  of  Intensive  applied  scientific  study.  Tin 
is  not  the  case.  There  is,  in  fact,  an  applied  scientific  lag  in  the  study  of 
computer  programers  and  computer  programing,  a  widening  and  critical  lag  that 
threatens  the  Industry  and  the  profession  with  the  great  waste  that  inevitably 
accompanies  the  absence  of  systematic  and  established  methods  and  findings,  and 
its  substltutlpn  by  anecdotal  opinion,  vested  interests  and  provincialism. 

The  problem  of  the  applied  scientific  lag  in  computer  programing  is  strikingly 
highlighted  in  the  field  of  online  versus  offline  programing.  The  spectacular 
rise  of  time-shared  computing  systems  over  the  last  few  years  has  raised  a  criti 
cal  issue  for  many,  if  not  most  managers  of  computing  facilities.  Should  they 
or  should  they  not  convert  from  a  batch-processing  operation,  or  from  some  othei 
form  of  noninteractlve  information  processing,  to  time-shared  operations?  Spir: 
controversy  has  been  generated  at  professional  meetings,  in  the  literature  and  t 
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grass-roots  levels,  but  virtually  no  experimental  comparisons  have  been  made 
to  objectively  test  and  evaluate  these  competing  alternatives  under  controlled 
conditions.  Except  for  a  related  study  by  Gold  (1966)  which  is  in  progress, 
the  two  experimental  studies  reported  in  this  article  are  the  first,  to  our 
knowledge,  that  Lave  appeared  on  this  central  issue.  They  illustrate  the  prob¬ 
lems  and  the  pitfalls  in  doing  applied  experimental  work  in  computer  programing. 
They  spell  out  some  of  the  key  dimensions  of  the  scientific  lag  in  computer  pro¬ 
graming  and  they  provide  some  useful  guidelines  for  future  work. 

Time-sharing  systems,  because  of  requirements  for  expanded  hardware  and  more 
extensive  software,  are  generally  more  expensive  than  closed-shop  systems  using 
the  same  central  computer.  Time-sharifig  advocates  feel  that  such  systems  more 
than  pay  for  themselves  in  convenience  to  the  user,  in  more  rapid  program  develop¬ 
ment,  and  in  manpower  savings.  It  appears  that  most  programers  who  have  worked 
with  both  time-sharing  and  closed-shop  systems  are  enthusiastic  about  the  online 
way  of  Ufa. 

Tlme-sharir.g,  however,  has  its  critics.  Their  arguments  are  often  directed  at 
the  efficiency  of  tine-sharing;  that  is,  at  how  much  of  the  computational  power 
of  the  machine  is  actually  used  for  productive  data  processing  as  opposed  to  how 
much  is  devoted  to  relatively  non-productive  functions  (program  swapping,  idle 
time,  etc.).  These  critics  (see  Patrick  1963,  Emerson  1965,  and  McDonald  1965) 
claim  that  the  efficiency  of  time-sharing  systems  is  questionable  when  compared 
to  modern  closed-shop  methods,  or  with  economical  small  computers.  Since  online 
systems  are  presumably  more  expensive  than  offline  systems, 
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justification  for  their  use  except  in  ti'ose  situations  where  online  access  is 
mandatory  for  system 'operations  (for  example,  in  realtime  command  and  control 
systems).  Time-sharing  advocates  respond  to  these  charges  by  saying  that,  even 
if  time-sharing  is  more  costly  with  regard  to  hardware  and  operating  efficiency, 
savings  in  programer  man-hours  and  in  the  time  required  to  produce  working  pro¬ 
grams  more  than  offset  such  increased  costs.  The  critics,  however,  do  not 
concede  this  point  either.  Many  believe  that  programers  grow  lazy  and  adopt 
careless  and  inefficient  work  habits  under  time-sharing.  In  fact,  they  claim 
that  Instead  of  improving,  programer  performance  is  likely  to  deteriorate. 

The  two  exploratory  studies  summarized  here  are  found  in  Grant  and  Sackman  (1966) 
and  in  Erikson  (1966) .  The  original  studies  should  be  consulted  for  technical 
details  that  are  beyond  the  scope  of  this  article.  They  were  performed  by  the 
System  Development  Corporation  for  the  Advanced  Research  Projects  Agency  of  the 
Deportment  of  Defense.  The  first  study  is  concerned  with  online  ve.  us  offline 
debugging  performance  for  a  group  of  12  experienced  programers  (average  of  seven 
years  experience).  The  second  Investigation  Involved  9  programer  trainees  in  a 
comparison  of  Interactive  versus  nonlnteractlve  program  debugging.  The  high¬ 
lights  of  each  study  arc  discussed  In  turn  and  the  composite  results  are  inter¬ 
preted  In  the  concluding  action.  For  sealer  reference,  tha  first  experiment 
Is  described  as  the  "Expelemced  Programer”  study,  and  the  second  as  the  "Pro¬ 
gramer  Trainee"  study. 

The  two  experiments  were  conducted  using  the  SDC  Time-Sharing  System  (TSS)  under 
the  normal  online  condition  and  simulated  offline  or  noninteractive  conditions. 
TSS  Is  a  general-purpose  system  (see  Schwarts,  Coffman  and  Welssman,  1964) 
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similar  in  n^ny  respects  to  the  Project  MAC  system  *.e  Scherr,  1966)  at  the 
Massachusetts  Institute  of  Technology.  Schwartz  (1965)  has  characterized  this 
i  class  of  time-sharing  system  as  providing  four  important  properties  to  the 
user:  "instantaneous"  response,  independent  operation  for  each  user,  essentially 
simultaneous  operation  for  several  users,  and  general-purpose  capability. 

TSS  utilizes  an  IBM  AN/FSQ-32  computer.  The  following  is  a  general  description 
of  its  operation.  User  programs  are  stored  on  magnetic  tape  or  in  disc  Tile 
memory.  When  a  user  wishes  to  operate  his  program,  he  goes  to  one  of  several 
teletype  consoles;  these  consoles  are  direct  input /output  devices  to  the  Q-32. 

He  instructs  the  computer,  through  the  teletype,  to  load  and  activate  his  pro¬ 
gram.  The  system  then  loads  the  program  either  from  the  disc  file  or  from 
magnetic  tape  into  active  storage  (drum  memory) .  All  currently  operating  pro¬ 
grams  are  stored  on  drum  memory  and  are  transferred,  one  at  a  time,  in  turn, 
into  core  memory  for  processing.  Under  TSS  scheduling  control,  each  program  is 
processed  for  a  short  amount  of  time  (usually  a  fraction  of  a  second)  and  is 
then  replaced  in  active  storage  to  await  its  next  turn.  A  program  is  trans¬ 
ferred  to  core  only  if  it  requires  processing;  otherwise  it  is  passed  up  for 
that  turn.  Thus,  a  user  may  spend  as  much  time  as  he  needs  thinking  about  what 
to  do  ne.:t  without  wasting  the  computational  time  of  the  machine.  Although  a 
time-sharing  system  processes  programs  sequentially  and  discontlnuously ,  it 
gives  users  the  illusion  of  simultaneity  and  continuity  because  of  its  high 
speed. 
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l.  EXPERIENCED  PROCRAMER  STUDY 

1.1  EXPERIMENTAL  DESIGN 

The  design  used  la  this  experiment  is  illustrated  in  Figure  1. 


ONLINE 

OFFLINE 

GROUP  I 

Algebra 

(6) 

Maze  (6) 

GROUP  II 

Maes 

(6) 

Algebra  (6) 

TOTALS 

(12) 

(12) 

Figure  1,  Experimental  Design  for  the  Experienced 
Prograaer  Study 

The  2  by  2  Latln-Squsre  design  with  repeated  measures  for  this  experiment  should 
be  Interpreted  as  follows.  Two  experimental  groups  were  employed  with  six  sub¬ 
jects  in  each;  the  two  experimental  treatments  were  online  and  offline  program 
debugging;  and  the  Algebra  and  Mass  problems  were  the  two  types  of  programs  that 
were  coded  and  debugged.  Repeated  measures  were  employed  in  that  each  subject 
solved  one  problem  task  under  online  condition*  end  the  other  under  offline  con¬ 
ditions,  serving  as  his  own  control.  Note  in  Figure  1  that  each  of  the  two  pro¬ 
gram  problems  appears  once  and  only  once  in  each  row  and  column  to  meet  the  require¬ 
ments  of  the  2  by  2  Latln-Square.  Subjects  were  assigned  to  the  two  groups  at 
random,  and  problem  order  end  onllne/offllne  order  were  counterbelanced. 

The  statistical  treatment  for  this  design  involves  an  analysis  of  variance  to  test 
for  the  significance  of  mean  differences  between  the  online  and  offline  conditions 
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and  between  the  Algebra  and  Maze  problems.  There  are  two  analyses  of  variance, 
corresponding  to  the  two  criterion  measures,  one  for  programer  man-hours  spent 
in  debugging  and  the  other  for  central  processor  time.  A  leading  advantage  of 
the  Latin-Square  design  for  this  experiment  is  that  each  analysis  of  variance 
incorporates  a  total  of  24  measurements.  This  configuration  permits  maximum 
pooled  sample  size  and  high  statistical  efficiency  in  the  analysis  of  the  results 
especially  desirable  features  in  view  of  the  small  subject  samples  that  were  used 

1.2  METHOD 

A  number  of  problems  were  encountered  in  the  design  and  conduct  of  this  experi¬ 
ment.  Many  are  illustrative  of  problems  in  experimenting  with  operational  com¬ 
puter  systems,  and  many  stemmed  from  lack  of  experimental  precedent  in  this  area. 
Key  problems  are  described  below. 

1.2.1  Online  and  Offline  Conditions.  Defining  the  online  condition  posed  no 
problems.  Programers  debugging  online  were  simply  instructed  to  use  TSS  in  the 
normal  fashion.  All  the  standard  features  of  the  system  were  available  to  them 
for  debugging.  Defining  the  offline  condition  proved  more  difficult.  It  was 
desired  to  provide  a  controlled  and  uniform  turnaround  time  for  the  offline  con¬ 
dition.  It  was  further  desired  that  this  turnaround  time  be  short  enough  so  that 
subjects  could  be  released  to  their  regular  jobs  and  the  experiment  completed  in 
a  reasonable  amount  of  time;  on  the  other  hand,  the  turnaround  time  had  to  be 
long  enough  to  constitute  a  significant  delay.  The  compromise  reached  was  two 
hours — considerably  shorter  than  most  offline  systems  and  yet  long  enough  so  that 
most  of  the  programer-subjects  complained  about  the  delay. 
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It  was  decided  to  simulate  an  offline  system  using  TSS  and  the  Q-32  by  requiring 
the  programer  to  submit  a  work  request  to  a  member  of  the  experimental  staff  to 
have  his  program  operated.  The  work  request  contained  specific  instructions 
from  the  programer  on  the  procedures  to  be  followed  in  running  the  program — 
essentially  the  same  approach  used  in  closed-shop  computer  facilities.  Strictly 
speaking,  then,  this  experiment  vaa  a  comparison  between  online  and  simulated 
offline  operations. 

Each  programer  was  required  to  code  his  own  program,  using  his  own  logic,  and 
to  rely  on  the  specificity  of  the  problem  requirements  for  comparable  programs. 
Program  coding  procedures  were  Independent  of  debugging  conditions,  i.e., 
regardless  of  the  condition  Imposed  for  checkout — online  or  offline — all  pro- 
gramers  coded  offline.  Programsrs  primarily  wrote  their  programs  in  JTS  (JOVIAL 
Time-Sharing — a  procedure-oriented  language  for  time-sharing). 

1,2.2  Experimental  Problems .  Two  program  problem  statements  were  designed 
for  the  experiment-  One  problem  required  the  subjects  to  write  a  program  to 
Interpret  teletype- Inserted,  algebraic  equations.  Each  equation  involved,  a  single 
dependent  variable.  The  program  was  required  to  compute  the  value  of  the  dependent 
variable,  given  teletype- Inserted  values  for  the  independent  variables,  and  to 
check  for  specific  kinds  of  errors  in  teletype  input.  All  programers  were  referred 
to  a  published  source  (Samelson  and  Bauer,  1960)  for  a  suggested  workable  logic  to 
solve  the  problem.  Programs  written  to  solve  this  problem  were  referred  to  as 
Algebra  programs. 
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The  other  problem  celled  for  writing  a  program  to  find  the  one  and  only  path 
through  a  20  by  20  cell  maze.  The  programs  were  required  to  print  out  the 
designators  of  the  cells  constituting  the  path.  Each  cell  was  represented  as 
an  entry  in  a  400-item  table,  and  each  entry  contained  information  on  the 
directions  in  which  movement  was  possible  from  the  cell.  These  programs  were 
referred  to  as  Maze  programs. 

1.2.3  Performance  Measures.  Debugging  time  was  considered  to  begin  when 
the  programer  had  coded  and  compiled  a  program  with  no  serious  format  errors 
detected  by  the  compiler.  Debugging  was  considered  finished  when  the  subject’s 
program  was  able  to  process,  without  errors,  a  standard  set  of  test  Inputs.  Two 
basic  criterion  measures  were  collected  for  comparing  online  and  offline 
debugging — programer  man-hours  and  central  processor  (CPU)  time. 

Man-hours  for  debugging  were  actual  hours  spent  on  the  problem  by  the  programer 
(Including  turnaround  time).  Hours  were  carefully  recorded  by  close  personal 
observation  of  each  programer  by  the  experimental  staff  in  conjunction  with  a 
dally  time  log  kept  by  the  subjects.  Discrepancies  between  observed  time  and 
reported  time  were  resolved  by  tactful  Interviewing.  TSS  keeps  its  own  accounting 
records  on  user  activity ;  these  records  provided  accurate  measures  of  the  central 
processor  time  used  by  each  subject.  The  recorded  CPU  time  included  program 
execute  time,  some  system  overhead  time,  and  times  for  dumping  the  contents  of 
program  or  system  registers. 

A  variety  of  additional  measures  were  obtained  in  the  course  of  the  experiment 
to  provide  control  data,  and  to  obtain  additional  Indices  of  programer  performance. 
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Control  measures  included:  TSS  experience,  general  programing  experience 
(excluding  TSS  experience) ,  type  of  programing  language  used  (JTS  or  machine 
language),  and  the  number  of  computer  runs  submitted  by  each  subject  in  the 
offline  condition.  Additional  programer  performance  measures  included:  man¬ 
hours  spent  on  each  program  until  a  successful  pass  was  made  through  the  com¬ 
piler  (called  coding  time),  program  size  in  machine  instructions,  program 
running  time  for  a  successful  pass  through  the  test  data,  and  scores  on  the 
Basic  Programing  Knowledge  Test  (BPKT) — a  paper-and-pencil  test  developed  by 
Berger,  et  al. ,  (1966)  at  the  University  of  Southern  California. 

1.3  RESULTS  V 

1.3.1  Criterion  Performance.  Table  1  shows  the  means  and  standard  deviations 
for  the  two  criterion  variables,  debug  man-hours  and  CPU  time.  These  raw-score 
values  show  a  consistent  and  substantial  superiority  for  online  debug  man-hours, 
from  50  percent  to  300  percent  faster  than  the  offline  condition.  CPU  time  shows 
a  reverse  trend;  the  offline  condition  consistently  required  about  30  percent  less 
CPU  time  than  the  online  mode.  The  standard  deviations  are  comparatively  large 
in  all  cases,  reflecting  extensive  individual  differences.  Are  these  results 
statistically  significant  with  such  small  samples? 

Table  2  shows  three  types  of  analysis  of  variance  applied  to  the  Latin-Square 
experimental  design.  The  first  is  a  straightforward  analyris  of  raw  scores.  The 
second  is  an  analysis  of  square-root  transformed  scores  to  obtain  more  normal 
distributions.  The  third  is  also  an  analysis  of  variance  on  the  square-root 

scores,  but  with  the  covariance  associated  with  programer  coding  skill  partial led 
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out  statistically;  that  is,  individuals  were  effectively  equated  on  coding 
skill  so  that  onllne/offllne  differences  could  be  tested  more  directly. 


Table  1.  Experienced  Programer  Performance 


Debug  Man-Hours 

Algebra 

Mare 

Online 

Offline 

Online 

Offline 

Mean 

34.5 

50.2 

4.0 

12.3 

SD 

30.5 

58.9 

4.3 

8.7 

CPU  Time 

(eee.) 

Algebra 

Mase 

gfiline 

Offline 

Online 

Offline 

Mean 

1266 

907 

229 

191 

SD 

473 

1067 

175 

136 

Table  2.  Comparative  Insults  of  Three  Analyses  of  Variance 


Performance 

Measures 

lev 

Scores 

Significance 

Square 

loot 

Levels 

Square  Root 
With  Covariance 

1.  Debug  Man-Hours 

Online  vs.  Offline 

None 

.10 

.025 

Algebra  vs.  Mete 

.025 

.001 

.10 

2.  CPU  Time 

Online  vs.  Offline 

Algebra  vs.  Mass 


None 

None 


None 

.001 


None 

.05 
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These  applications  resulted  in  six  analyses  of  variance  (three  for  each  criterion 
measure)  as  shown  in  Table  2.  The  columns  in  Table  2  represent  the  three  kinds 
of  analysis  of  variance;  the  rows  show  the  two  criterion  measures.  For  each 
analysis  of  variance,  tests  for  mean  differences  compared  online  versus  offline 
performance,  and  Algebra  versus  Maze  differences.  The  entries  in  the  cells  show 
the  level  of  statistical  significance  found  for  these  two  main  effects  for  each 
of  the  six  analyses  of  variance. 

The  results  in  Table  2  reveal  key  findings  for  this  experiment.  The  first  row 
shows  results  for  online  versus  offline  performance  as  measured  by  debug  man- 
hours.  The  raw-score  analysis  of  variance  shows  no  significant  differences. 

The  analysis  on  square- root  transformed  scores  shows  a  10-percent  level  of  sig¬ 
nificance  in  favor  of  online  performance.  The  last  analysis  of  variance,  with 
covariance,  on  square-root  scores,  shows  statistically  significant  differences 
in  favor  of  the  online  condition  at  the  .025  level.  This  progressive  trend 
toward  more  clearcut  mean  differences  for  shorter  debug  man-hours  with  online 
performance  reflects  the  increasing  statistical  control  over  Individual  dif¬ 
ferences  in  the  three  types  of  analyses.  In  contrast  to  debug  man-hours,  no 
significant  trend  is  indicated  for  online  versus  offline  conditions  for  CPU  time. 
If  real  differences  do  exist,  along  the  lines  Indicated  in  Table  1  for  more  CPU 
time  in  the  online  mode,  these  differences  were  not  strong  enough  to  show  statls- 
tical  significance  with  these  small  samples  and  with  the  large  individual  dif¬ 
ferences  between  programsrs,  even  with  the  square-root  and  covariance  transfor¬ 


mations 
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The  results  for  Algebra  versus  Maze  differences  were  not  surprising.  The  Algebra 
task  was  obviously  a  longer  and  harder  problem  than  the  Maze  task,  as  indicated 
by  all  the  performance  measures.  The  fairly  consistent  significant  differences 
between  Algebra  and  Maze  scores  shown  in  Table  2  reflect  the  differential  effects 
of  the  three  tests  of  analysis  of  variance,  and,  in  particular,  point  up  the 
greater  sensitivity  of  the  square-root  transformations  over  the  original  raw 
scores  in  demonstrating  significant  problem  differences. 

1*3.2  Individual  Differences.  The  observed  ranges  of  individual  differences 
are  listed  in  Table  3  for  the  10  performance  variables  measured  in  this  study. 

The  ratio  between  highest  and  lowest  value*  is  also  shown. 


Table  3.  Range  of  Individual  Differences  in  Programing  Performance 


Performance  Measure 

Poorest  Score 

Best  Score 

Ratio 

1. 

Debug  Hours  Algebra 

170 

6 

28:1 

2. 

Debug  Hours  Maze 

26 

1 

26:1 

3. 

CPU  Time  Algebra  (sec.) 

3075 

370 

6:1 

4. 

CPU  Time  Maze  (sec.) 

541 

50 

11:1 

5. 

Code  Hours  Algebra 

111 

7 

16:1 

6. 

Code  Hours  Maze 

50 

2 

25:1 

7. 

Program  Size  Algebra 

6137 

1050 

6*1 

8. 

Program  Size  Maze 

3287 

651 

5:1 

9. 

Run  Time  Algebra  (sec.) 

7.9 

1.6 

5:1 

10. 

Run  Time  Maze  (sec.) 

8.0 

.6 

13:1 
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Table  3  points  up  the  very  large  individual  differences,  typically  by  an  order 

of  magnitude,  for  most  performance  variables.  To  paraphrase  a  nursery  rhyme: 

When  a  programer  is  good. 

He  is  very,  very  good. 

But  when  he  is  bad, 

He  is  horrid. 

The  "horrid"  portion  of  the  performance  frequency  distribution  is  the  long  tail 
at  the  high  end,  the  positively  skewed  part  in  which  one  poor  performer  can 
consume  as  much  time  or  cost  as  5,  10,  or  20  good  ones.  Validated  techniques 
to  detect  and  weed  out  these  poor  performers  could  result  in  vast  savings  of 
time,  effort,  and  cost. 

To  obtain  further  information  on  these  striking  individual  differences,  an 
exploratory  factor  analysis  was  conducted  on  the  intercorrelations  of  15  per¬ 
formance  and  control  variables  in  the  experimental  data.  Coupled  with  visual 
inspection  of  the  empirical  correlation  matrix,  the  main  results  were: 

a,  A  substantial  performance  factor  designated  as  "programing 
speed,"  associated  with  faster  coding  and  debugging, 

less  CPU  time,  and  the  use  of  a  higher-order  language. 

b.  A  well-defined  "program  economy"  factor  marked  by  shorter 
and  faster  running  programs  associated  to  some  extent  with 
greater  programing  experience  and  with  the  use  of  machine 
language  rather  than  higher-order  language. 

This  concludes  the  description  of  the  method  and  results  of  the  first  study. 

The  second  study  on  programer  trainees  follows. 
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2.  PROGRAMER  TRAINEE  STUDY 

2.1  EXPERIMENTAL  DESIGN 

A  2  by  2  Latln-Square  design  was  also  used  in  this  experiment.  With  this 
design,  as  shown  in  Figure  2,  the  Sort  Routine  problem  was  solved  by  Group  I 
(consisting  of  four  subjects)  in  the  noninteractlve  mode  and  by  Group  II  (con¬ 
sisting  of  the  other  five  subjects)  in  the  interactive  mode.  Similarly,  the 
second  problem,  a  Cube  Puzzle,  was  worked  by  Group  I  in  the  Interactive  mode 
and  by  Group  II  in  the  noninteractlve  mode. 


INTERACTIVE 

NONINTERACTIVE 

GROUP 

I  (4) 

Cube  Puzzle 

Sort  Routine 

GROUP 

II  (5) 

Sort  Routine 

Cube  Puzzle 

TOTAL 

9  Subjects 

Figure  2.  Experimental  Design  for  the  Programer 
Trainee  Study 


Analysis  of  variance  was  used  to  test  the  significance  of  the  differences  between 
the  mean  values  of  the  two  test  conditions  (Interactive  and  Noninteractlve)  and 
the  two  problems.  The  first  (test  conditions)  was  the  central  experimental 
inquiry,  and  the  other  was  of  interest  from  the  point  of  view  of  control. 

2.2  METHOD 

Nine  programer  trainees  were  randomly  divided  into  two  groups  of  four  and  five 
each.  One  group  coded  and  debugged  the  first  problem  interactively  while  the 
other  group  did  the  same  problem  in  a  noninteractlve  node.  The  two  groups 
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switched  computer  system  type  for  the  second  problem.  All  subjects  used  TINT 
(Kennedy,  1965)  for  both  problems.  (TINT  is  a  dialect  of  JOVIAL  that  is  used 
interpretively  with  TSS.) 

2.2.1  Interactive  and  Noninteractive  Conditions.  "Interactive,"  for  this 
experiment,  meant  the  use  of  TSS  and  the  TINT  language  with  all  of  its  associated 
aids.  No  restrictions  in  the  use  of  this  language  were  placed  upon  the  subjects. 

The  noninteractive  condition  was  the  same  as  the  interactive  except  that  the 
subjects  were  required  to  quit  after  every  attempted  execution.  The  subjects  ran 
their  own  programs  under  close  supervision  to  assure  that  they  were  not  inadver¬ 
tently  running  their  jobs  in  an  interactive  manner.  If  a  member  of  the  noninter¬ 
active  group  immediately  saw  his  error  and  if  there  were  no  other  members  of  the 
noninteractlve  group  waiting  for  a  teletype,  then,  after  he  quit,  he  was  allowed 
to  log  in  again  without  any  waiting  period.  Waiting  time  for  an  available  con¬ 
sole  in  the  noninteractlve  mode  fluctuated  greatly,  but  typically  Involved  minutes 
rather  than  hours. 

2.2.2  Experimental  Problems.  The  two  experimental  tasks  were  relatively 
simple  problems  that  were  normally  given  to  students  by  the  training  staff.  The 
first  Involved  writing  a  numerical  sort  routine,  and  the  second  required  finding 
the  arrangement  of  four  specially  marked  cubes  that  met  a  given  condition.  The 
second  problem  was  more  difficult  than  the  first,  but  neither  required  more  than 
five  days  of  elapsed  time  for  a  solution  by  any  subject.  The  subjects  worked  at 
each  problem  until  they  were  able  to  produce  a  correct  solution  with  a  run  of 
their  program. 
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2.2.3  Performance  Measures.  CPU  time,  automatically  recorded  for  each 
trainee,  and  programer  man-hours  spent  debugging  the  problem,  recorded  by  indi¬ 
vidual  work  logs,  were  the  two  major  measures  of  performance.  Debugging  was 
assumed  to  begin  when  a  subject  logged  in  for  the  first  time,  that  is,  after  he 
had  finisher  coding  his  program  at  hla  desk  and  was  ready  for  initial  runs  to 
check  and  test  his  program. 

2.3  RESULTS 

2.3.1  Criterion  Performance.  A  summary  of  the  results  of  this  experiment 
is  shown  in  Table  4.  Analysis  of  variance  showed  the  difference  between  the 
raw-score  mean  values  of  debug  hours  for  the  Interactive  and  the  noninteractive 
conditions  to  be  significant  at  the  .13  level.  The  difference  bet*’  the  two 
experimental  conditions  for  mean  values  of  CPU  seconds  was  significant  at  the 

.08  level.  In  both  cases,  better  perform*  <ce  (faster  solutions)  was  obtained 
under  the  interactive  mode.  In  the  previous  experiment,  the  use  of  square-root 
transformed  scores  and  the  use  of  coding  hours  si  a  coveriete  allowed  better 
statistical  control  over  the  differences  between  Individual  subjects.  No  such 
result  was  found  in  this  experiment. 

If  each  of  the  subjects  could  be  directly  compared  to  himself  as  he  worked  with 
each  of  the  systems,  the  problem  of  matching  subjects  or  subject  groups  and  the 
need  for  extensive  statistical  analysis  could  be  eliminated.  Unfortunately,  it 
is  not  meaningful  to  have  the  same  subject  code  and  debug  the  same  problem  twice 
and  it  is  extremely  difficult  to  develop  different  problems  that  are  at  the  same 
level  of  difficulty.  One  possible  solution  to  this  problem  would  be  to  use  some 
measure  of  problem  difficulty  as  a  normalising  factor.  It  should  be  recognised 
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that  the  use  of  any  normalizing  factor  can  Introduce  problems  in  analysis  and 
interpretation.  It  was  decided  to  use  one  of  the  more  popular  of  such  measures, 
namely,  the  number  of  instructions  in  the  program.  CPU  time  per  instruction  and 
debug  man-heurs  per  instruction  were  compared  on  the  two  problems  for  each  sub¬ 
ject  for  the  interactive  and  roninteractive  conditions.  The  results  showed  that 
the  interactive  subjects  had  significantly  lower  values  on  both  compute  seconds 
per  instruction  (.01  level)  and  debug  hours  per  instruction  (.06  level). 


Table  A.  Programer  Trainee  Performance 


Debug  Man-Hours 


Sort  Routine 


intact  iv« 
0.71 


latl.  active 


CPU  Time  (sec.) 

Hon inter active 
109.1 


Cube  Puzzle 


sractivt 


290.2 

213.0 


Honinter active 


875.3 

392.6 


2.3.2  Individual  Differences.  One  of  the  key  findings  of  the  previous  study 
was  that  there  were  large  individual  differences  between  programers.  Because  of 
differences  in  sampling  and  scale  factors,  coefficients  of  variation  were  computed 
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to  compare  Individual  differences  in  both  studies.  (The  coefficient  of  varia¬ 
tion  Is  expressed  as  a  percentage;  it  is  equal  to  the  standard  deviation  divided 
by  the  mean,  multiplied  by  100.)  The  overall  results  showed  that  coefficients 
of  variation  for  debug  man-hours  nd  CPU  time  in  this  experiment  were  only  16 
percent  smaller  than  coefficients  of  variation  in  the  experienced  programer 
study  (median  values  of  66  percent  and  82  percent,  respectively).  These  observed 
differences  may  be  attributable,  in  part,  to  the  greater  difficulty  level  of  the 
problems  in  the  experienced  programer  study,  and  to  the  much  greater  range  of 
programing  experience  between  subjects  which  tended  to  magnify  individual  pro¬ 
gramer  differences. 

In  an  attempt  to  determine  if  there  are  measures  of  skill  that  can  be  used  as  a 
preliminary  screening  tool  to  equalise  groups,  data  were  gathered  on  the  subject's 
grades  in  the  SDC  programer  training  class,  and,  as  mentioned  earlier,  they  were 
also  given  the  Basic  Programing  Knowledge  Test  (BPKT).  Correlations  between  all 
experimental  measures,  adjusted  scores,  gradss,  and  the  BPKT  results  were  deter¬ 
mined.  Except  for  some  spurious  part-whole  correlations,  the  results  showed  no 
consistent  correlation  between  performance  measures  and  the  various  grades  and 
test  scores.  The  most  interesting  result  of  this  exploratory  analysis,  however, 
was  that  class  grades  and  BPKT  scores  showed  substantial  lrtercorrelations. 

This  is  especially  notable  when  only  the  first  of  the  two  BPKT  scores  is  con¬ 
sidered.  These  correlations  ranged  between  .64  and  .83  for  Part  I  of  the  BPKT; 
two  out  of  these  four  correlations  are  at  the  5  percent  level  and  one  exceeds 
the  1  percent  level  of  significance  even  for  these  small  samples.  This  implies 
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that  the  BPKT  Is  measuring  the  same  kinds  of  skills  that  are  measured  in  trainee 
class  performance.  It  should  also  be  noted  that  neither  class  grades  nor  BPKT 
scores  would  have  provided  useful  predictions  of  trainee  performance  in  the  test 
situation  that  was  used  in  this  experiment.  This  observation  may  be  interpreted 
three  basic  ways:  first,  that  the  BPKT  and  class  grades  are  valid  and  that  the 
problems  do  not  represent  general  programing  tasks;  second,  that  the  problems 
are  valid,  but  that  the  BPKT  and  class  grades  are  not  indicative  of  working  pro- 
gramer  performance;  or  third,  that  interrelations  between  the  BPKT  and  class 
grades  do  in  fact  exist  with  respect  to  programing  performance,  but  that  the 
Intercorrelations  are  only  low  to  moderate,  which  cannot  be  detected  by  the  very 
small  samples  used  in  these  experiments.  The  results  of  these  studies  are 
ambiguous  with  respect  to  these  three  hypotheses;  further  investigation  is  re¬ 
quired  to  determine  whether  om  or  a m#  csmfr Inst Ion  of  them  will  hold. 

3. 

Before  drawing  any  conclusions  from  the  results,  consider  the  scope  of  the  two 
studies.  Each  dealt  with  a  small  number  of  subjects — performance  measures  were 
marked  by  large  orror  variance  and  wide-ranging  individual  differences,  which 
made  statistical  Inference  difficult  and  risky.  The  subject  skill  range  was 
considerable,  from  programer  trainees  in  one  stuiy  to  highly  experienced  research 
and  development  programers  in  the  other.  The  programing  languages  included  one 
machine  language  ana.  two  subsets  of  JOVIAL,  a  higher-order  language.  In  both 
experiments  TSS  served  as  the  online  or  Interactive  condition  whereas  the  off¬ 
line  or  nonlnteractlve  mode  had  to  be  simulated  on  TSS  according  to  specified 
rules.  Only  one  facirity  wee  used  for  both  experiments— *  TSS.  The  problems 
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ranged  from  the  conceptually  simple  tasks  administered  to  the  programer  trainees 
to  the  much  more  difficult  problems  given  to  the  experienced  programers.  The 
representativeness  of  these  problems  for  programing  tasks  is  unknown.  The  point 
of  this  thumbnail  sketch  of  the  two  studies  is  simply  to  emphasize  their  tenta¬ 
tive,  exploratory  nature — at  best  they  covet  a  highly  circumscribed  set  of  on¬ 
line  and  offline  programing  behaviors. 

The  interpretation  of  the  results  is  discussed  under  three  broad  areas,  cor¬ 
responding  to  three  leading  objectives  of  these  two  studies:  comparison  of  on¬ 
line  and  offline  programing  performance,  analysis  of  individual  differences  in 
programing  proficiency,  and  Implications  of  the  methodology  and  findings  for 
future  research. 

3.1  ONLINE  VERSUS  OFFLINE  PROGRAMING  PERFORMANCE 

On  the  basis  of  tha  concrete  results  of  these  experiments,  the  online  conditions 
resulted  in  subetentlally  and,  by  and  large,  elgniflcantly  better  performance 
for  debug  man-hours  than  the  offline  conditions.  The  crucial  question  is:  to 
what  extent  may  these  results  be  generalised  to  other  computing  facilities,  to 
other  programers,  to  varying  levels  of  turnaround  time,  and  to  other  types  of 
programing  problems?  Provisional  answers  to  these  four  questions  highlight 
problem  areas  requiring  further  research. 

The  online/offline  comparisons  were  made  in  a  time-shared  computing  facility  in 
which  the  online  condition  was  the  natural  operational  mode,  whereas  offline 
conditions  had  to  be  simulated.  It  might  be  argued  that  in  analogous  experi- 

ments,  conducted  with  e  betch-processlng  facility,  with  reel  offline  conditions 
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and  simulated  online  conditions,  the  results  might  be  reversed.  One  way  to  sur¬ 
mount  this  simulation  bias  is  to  conduct  an  experiment  in  a  hybrid  facility  that 
uses  both  time-sharing  and  batch-processing  procedures  on  the  same  computer  so 
that  neither  has  to  be  simulated.  Another  approach  is  to  compare  facilities 
matched  on  type  of  computer,  programing  languages,  compilers  and  other  tools  for 
coding  and  debugging,  but  differing  in  online  and  offline  operations.  It  might 
also  be  argued  that  the  use  of  new  and  different  programing  languages,  methods 
and  tools  might  lead  to  entirely  different  results. 

The  generalization  of  these  results  to  other  programers  essentially  boils  down 
to  the  representativeness  of  the  experimental  samples  with  regard  to  an  objective 
and  well-defined  criterion  population.  A  universally  accepted  classification 
scheme  for  programers  does  not  exist,  nor  are  there  accepted  norms  with  regard 
to  biographical,  educational  and  job-experience  data. 

In  certain  respects,  the  differences  between  online  and  offline  performance 
hinge  on  the  length  and  variability  of  turnaround  time.  The  critical  experimental 
question  is  not  whether  one  sade  is  superior  to  the  other  mode,  because,  all  other 
things  equal,  offline  facilities  with  long  turnaround  times  consume  more  elapsed 
programing  time  than  either  online  facilities  or  offline  facilities  with  short 
turnaround  times.  The  critical  comparison  is  with  online  versus  offline  opera¬ 
tions  that  have  short  response  times.  The  data  from  the  experienced  programer 
study  suggests  the  possibility  that,  as  offline  turnaround  time  approaches  sero, 
the  performance  differential  between  the  two  modes  with  regard  to  debug  man-hours 
tends  to  disappear.  The  programer  trainee  study,  however,  tends  to  refute  this 
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hypothesis  since  the  mean  performance  advantage  of  the  Interactive  mode  was  con¬ 
siderably  larger  than  waiting  time  for  computer  availability.  Other  experimental 
studies  need  to  be  conducted  to  determine  whether  online  systems  offer  a  man¬ 
hour  performance  advantage  above  and  beyond  the  elimination  of  turnaround  time 
in  converting  from  offline  to  online  operations. 

The  last  of  the  four  considerations  crucial  to  any  generalization  of  the  experi¬ 
mental  findings — type  of  programing  problem — presents  a  baffling  obstacle.  How 
does  an  investigator  select  a  "typical"  programing  problem  or  set  of  problems? 

No  suitable  classification  of  computing  systems  exists,  let  alone  a  classifica¬ 
tion  of  types  of  programs.  Scientific  vs.  business,  online  vs.  offline,  auto¬ 
mated  vs.  semiautomat ed,  realtime  vs.  nonrealtime— these  and  many  other  tags  for 
computer  systems  and  computer  programs  are  much  too  gross  to  provide  systematic 
classification.  In  the  absence  of  a  systematic  classification  of  computer  pro¬ 
grams  with  respect  to  underlying  skills,  programing  techniques  and  applications, 
all  that  can  be  done  is  to  extend  the  selection  of  experimental  problems  to 
cover  a  broader  spectrum  of  programing  activity. 

The  preceding  discussion  has  besn  primarily  concerned  with  consistent  findings 
on  debug  man-hours  for  both  experiments.  The  opposite  findings  in  both  studies 
with  regard  to  CPU  time  require  some  comment.  The  results  of  the  programer 
trainee  study  seem  to  indicate  that  online  programing  permits  the  programer  to 
solve  his  problem  in  a  direct,  uninterrupted  manner  which  result*  not  only  in 
less  human  time  but  also  less  CPU  time.  The  programer  does  not  have  to  "warm 
up"  and  remember  his  problem  in  all  its  details  if  he  has  access  eo  the  vputer 
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whenever  he  needs  it.  In  contrast,  the  apparent  reduction  of  CPU  time  in  the 
experienced  programer  study  under  the  offline  condition  suggests  an  opposing 
hypothesis;  that  is,  perhaps  there  is  a  deliberate  tradeoff,  on  the  part  of  the 
programer,  to  use  more  machine  time,  in  an  exploratory  trial-and-error  manner, 
to  reduce  his  own  time  and  effort  in  solving  his  problem.  The  results  of  these 
two  studies  are  ambiguous  with  respect  to  these  opposing  hypotheses.  One  or 
both  of  them  may  be  true,  to  different  degrees  under  different  conditions.  Then 
again,  perhaps  these  explanations  are  too  crude  to  account  for  complex  problem¬ 
solving  in  programing  tasks.  Mora  definitive  research  is  needed. 

3.2  INDIVIDUAL  DIFFERENCES 

These  studies  revealed  large  individual  differences  between  high  and  low  per¬ 
formers,  often  by  an  order  of  magnitude.  It  is  apparent  from  the  spread  of  the 
data  that  very  substantial  savings  can  be  effected  by  successfully  detecting 
low  performers.  Techniques  measuring  individual  programing  skills  should  be 
vigorously  pursued,  tested  and  evaluated,  ud  developed  on  a  broad  front  for  the 
growing  variety  of  programing  jobs. 

These  two  studies  suggest  that  such  paper-and-pencil  tests  may  work  best  in  pre¬ 
dicting  the  performance  of  programer  trainees  and  relatively  inexperienced  pro¬ 
gramme.  The  observed  pattern  was  one  of  substantive  correlations  of  BPKT  test 
scores  with  programer  trainee  class  grades,  but  no  detectable  correlation  with 
experienced  programer  performance.  These  tentative  findings  on  our  small  samples 
are  consistent  with  Internal  validation  data  for  the  BPKT.  The  test  discriminates 
best  between  low  experience  levels  aad  falls  to  discriminate  significantly  among 
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highest  experience  levels.  This  situation  suggests  that  general  programing 
skill  may  dominate  early  training  and  Initial  on-the-job  experience,  but  that 
such  skill  is  progressively  transformed  and  displaced  by  more  specialized  skills 
with  Increasing  experience. 

If  programers  show  such  large  performance  differences,  even  larger  and  more 
striking  differences  may  be  expected  in  general  user  performance  levels  with  the 
advent  of  information  utilities  (such  as  large  networks  of  time-shared  computing 
facilities  with  a  broad  range  of  information  services  available  to  the  general 
public) .  The  computer  science  community  has  not  recognized  (let  alone  faced  up 
to)  the  problem  of  anticipating  and  dealing  with  very  large  individual  differ¬ 
ences  in  tasks  involving  man-computer  communications  for  the  general  public. 

In  an  attempt  to  explain  the  results  of  both  studies  In  regard  to  Individual 
differences  and  to  offer  a  framework  for  future  analyses  of  individual  differences 
in  programer  skills,  a  differentiation  hypothesis  is  offered,  as  follows:  when 
programers  are  first  exposed  to  and  indoctrinated  In  the  use  of  computers,  and 
during  their  early  experience  with  computers,  a  general  factor  of  programer  pro¬ 
ficiency  is  held  to  account  for  a  large  proportion  of  observed  individual  dif¬ 
ferences.  However,  with  the  advent  of  diversified  and  extended  experience,  the 
general  programing  skill  factor  differentiates  into  separate  and  relatively 
independent  factors  related  to  specialized  experience. 

From  a  broader  and  longer-range  perspective,  the  trend  in  computer  science  and 
technology  is  toward  more  diversified  computers,  programing  languages  and  com¬ 
puter  applications.  This  general  trend  toward  increasing  variety  is  likely  to 
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require  an  equivalent  diversification  of  human  skills  to  program  such  systems. 

A  pluralistic  hypothesis,  such  as  the  suggested  differentiation  hypothesis, 
seems  more  appropriate  to  anticipate  and  deal  with  this  type  of  technological 
evolution,  not  only  for  programers,  but  for  the  general  user  of  computing 
facilities. 

3.3  FUTURE  RESEARCH 

These  studies  began  with  a  rather  straightforward  objective — the  comparison  c" 
online  and  offline  programer  debugging  performance  under  controlled  conditions. 
But  in  order  to  deal  with  the  online/offline  comparison,  it  became  necessary 
to  consider  many  other  factors  related  to  man-machine  performance.  For  example, 
it  was  necessary  to  look  into  the  characteristics  and  correlates  of  individual 
differences.  We  had  to  recognize  that  there  was  no  objective  way  to  assess  the 
representativeness  of  the  various  experimental  problems  for  data  processing  in 
general.  The  results  were  constrained  to  a  single  computing  facility  normally 
using  online  operations.  The  debugging  criterion  measures  showed  relationships 
with  other  performance,  experience  and  control  variables  that  demanded  at  least 
preliminary  explanations.  Programing  languages  had  to  be  accounted  for  in  the 
Interpretation  of  the  results.  The  original  conception  of  a  direct  statistical 
comparison  between  online  and  offline  performance  had  to  give  way  to  multivariati 
statistical  analysis  to  interpret  the  results  in  a  more  meaningful  context. 

In  short,  our  efforts  to  measure  online/offline  programing  differences  in  an 
objective  manner  were  severely  constrained  by  the  lack  of  substantiv.  scientific 
information  on  computer  programing  performance — constrained  by  the  applied 
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scientific  lag  in  computer  programing,  which  brings  us  back  to  the  opening 
theme.  This  lag  is  not  localized  to  computer  programing;  it  stems  from  a  more 
fundamental  experimental  lag  in  the  general  study  of  man-computer  communications. 
The  case  for  this  assertion  involves  a  critical  analysis  of  the  status  and 
direction  of  computer  science  which  is  beyond  the  scope  of  this  article;  this 
analysis  is  presented  elsewhere  (Sackman,  1967).  In  view  of  these  various  con¬ 
siderations,  it  is  recommended  that  future  experimental  comparisons  of  online 
and  offline  programing  performance  be  conducted  within  the  broad  framework  of  pro- 
gramer  performance,  and  not  as  a  simple  dichotomy  existing  in  a  separate  data- 
processing  world  of  its  own.  It  is  far  more  difficult  and  laborious  to  construct 
a  scientific  scaffold  for  the  man-machine  components  and  characteristics  of  pro- 
grainer  performance  than  it  is  to  try  to  concentrate  exclusively  on  a  rigorous 
comparison  of  online  and  offline  programing. 

Eight  broad  areas  for  further  research  are  indicated: 

a.  Development  of  empirical,  normative  £-*•«  on  computing  system  per¬ 
formance  with  respect  to  type  of  av;  .-Cw.ion,  man-machine  environ¬ 
ment,  and  types  of  computer  programs  in  relation  to  leading  tasks 
in  object  systems. 

b.  Comparative  experimental  studies  of  computer  facility  performance, 
such  as  online,  offline,  and  hybrid  Installations,  systematically 
permuted  against  broad  classes  of  program  languages  (machine- 
oriented,  procedure-oriented  and  problem-oriented  languages),  and 
representative  classes  of  programing  tasks. 

c.  Development  of  cost-effectiveness  models  for  computing  facilities, 
incorporating  man  and  machine  elements,  with  greater  emphasis  on 
empirically  validated  measures  of  effectiveness,  and  less  emphasis 
on  abstract  models  than  has  been  the  case  in  the  past. 
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d.  Programer  job  and  task  analysis  based  on  representative  sampling 
of  programer  activities,  leading  toward  the  development  of 
empirically  validated  and  updated  job-classification  procedures. 

e.  Systematic  collection,  analysis  and  evaluation  of  the  empirical 
characteristics,  correlates,  and  variation  associated  with 
Individual  performance  differences  for  programers,  including 
analysis  of  team  effectiveness  and  team  differences. 

f.  Development  of  a  variety  of  paper-and-pencil  tests,  such  as  the 
Basic  Programing  Knowledge  Test,  for  assessment  of  general  and 

specific  programer  skills  In  relation  to  representative,  normative 
populations. 

g.  Detailed  case  histories  on  the  genesis  and  course  of  programer 
problem-solving,  the  frequency  and  nature  of  human  and  machine 
errors  in  the  problem-solving  process,  the  role  of  machine  feed¬ 
back  and  reinforcement  in  programer  behavior,  and  the  delineation 
of  critical  programer  decision  points  in  the  life-cycle  of  the 
design,  development  and  installation  of  computer  programs. 

h.  And  finally,  Integration  of  the  above  findings  into  the  broader 
arena  of  man-computer  communication  for  the  general  user. 

More  powerful  applied  research  on  programer  performance,  including  experimental 
comparisons  of  online  and  offline  programing,  will  require  the  development  in 
depth  of  basic  concepts  and  procedures  for  the  field  as  a  whole — a  development 
that  can  only  be  achieved  by  a  concerted  effort  to  bridge  the  scientific  gap 
between  knowledge  and  application. 
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